folditfandomcom-20200222-history
PDB
PDB refers to the Protein Data Bank at rcsb.org, and also the associated file format. The Protein Data Bank is a freely accessible repository containing over 100,000 proteins and other "biological macromolecular structures". Entries in the PDB have four-character identifiers like 3SQF. 3SQF is the protein from the Mason-Pfizer Monkey Virus, which was solved based on work by Foldit players. Many Foldit proteins, especially the ones in revisiting puzzles, have known structures that are available in the PDB. The puzzle results pages on this wiki often mention the PDB id. A protein may have more than one entry in the PDB, based on different studies. For example, solutions to the protein from 3SQF had previously been available as 1NSO. PDB data formats The PDB file for a protein contains detailed information about the protein, including the position of each atom in three-dimensional space. A smaller "header" version of the file is also available, which omits the 3D atom coordinates. The base format for the PDB file is an "old school", line-oriented text file, where tags like SHEET and HELIX at the beginning of each line identify the line's purpose. (PDB files often contain format statements used with the FORTRAN programming language, a clue to their ancient lineage.) The extension ".pdb" is used for this version of the PDB file. The same information is also available in PDBx/mmCIF format, which is intended to be easier to process programmatically while still being readable by humans. The PDBx/mmCIF format also removes some of the limitations of the old PDB format, which mainly impacted large, complex molecules. Many external tools, such as Jmol and Pymol, can display proteins based on their PDB id. These tools can be configured to give views similar to the ones available in Foldit. Foldit use of PDB The PDB file numbers segments or residues in the same way Foldit does. For specific proteins, there's often a difference between the two, since a Foldit puzzle may contain only part of a larger protein. For example, a puzzle might not include the first part of protein, so segment 1 in Foldit might correspond to segment 17 of the PDB entry. A PDB entry may also contain multiple "chains", or separate sections of protein. A Foldit puzzle typically contains only one chain. Chains are identified by letter, so A, B, C, and so on. The segment information window in Foldit includes the corresponding PDB chain and segment number when available. (To display segment information, hover over a segment a hit the tab key.) A given protein may appear in the PDB multiple times, and there's no guarantee that the segment numbering and chain will always be the same. So segment 1 of a puzzle might be segment 17 of chain A of one PDB entry, but segment 52 of chain C in another entry. PDB files use three-character amino acid codes, for example "ALA" for alanine. Single-character codes (such "A" for alanine) are used in Foldit recipes and also in the FASTA format. FASTA can used to search for proteins in the PDB and elsewhere. See amino acids for a complete list of amino acid names and codes. Foldit recipes such as print protein 2.4 and AA Edit 1.2 supply a string of of single-character amino acid codes that can be used for searching the PDB. When searching by sequence, using another tool, such as Jpred may produce better results than searching the PDB directly at rcsb.org. Limitations on copying The Foldit community rules limit how players can use information found in the PDB and other sources, stating: :Copying tertiary and/or quaternary structures from other players or third parties is forbidden. In other words, attempting to duplicate a protein based on atom coordinates from the PDB is considered cheating. Internal use Internally, Foldit uses some version of PDB format. A file with the extension ".OPDB" is downloaded as part of each puzzle. The OPDB file is encrypted and contains a digital signature which prevents it from being modified. Category:Glossary